AppraisalCloudPCT: A Computational Model of Emotions for Socially Interactive Robots for Autistic Rehabilitation

Computational models of emotions can not only improve the effectiveness and efficiency of human-robot interaction but also coordinate a robot to adapt to its environment better. When designing computational models of emotions for socially interactive robots, especially for robots for people with special needs such as autistic children, one should take into account the social and communicative characteristics of such groups of people. This article presents a novel computational model of emotions called AppraisalCloudPCT that is suitable for socially interactive robots that can be adopted in autistic rehabilitation which, to the best of our knowledge, is the first computational model of emotions built for robots that can satisfy the needs of a special group of people such as autistic children. To begin with, some fundamental and notable computational models of emotions (e.g., OCC, Scherer's appraisal theory, PAD) that have deep and profound influence on building some significant models (e.g., PRESENCE, iGrace, xEmotion) for socially interactive robots are revisited. Then, a comparative assessment between our AppraisalCloudPCT and other five significant models for socially interactive robots is conducted. Great efforts have been made in building our proposed model to meet all of the six criteria for comparison, by adopting the appraisal theories on emotions, perceptual control theory on emotions, a component model view of appraisal models, and cloud robotics. Details of how to implement our model in a socially interactive robot we developed for autistic rehabilitation are also elaborated in this article. Future studies should examine how our model performs in different robots and also in more interactive scenarios.


Introduction
Before probing into the scope of computational models of emotions, it is necessary to understand the emotion terminology for the purpose of clarity. To begin with, six terms (i.e., afect, appraisal, cognition, emotion, feeling, and mood) are defned in [1] as follows: (1) to inform one or more cognitive processes, afect which is any information (feeling, mood, and emotion) is used; (2) the process of making judgments (appraisals) about the relationship between an individual's beliefs, desires, and intentions [2] and perceived events is defned as appraisal; (3) mental processes associated with the comprehension, acquisition, and alteration of knowledge are defned as cognition, such as planning, learning, inference, and recall; (4) owing to concepts and states, emotion is used to inform responses and is defned as cognitive data generated by events (internal and external); (5) the subjective experience of an emotion or of a series of emotions is defned as feeling; (6) mood is the general state of an emotion that lasts longer and is less variable than the emotion itself.
As for the terminology computational models, according to Simon [3], by drawing results from a system's premises (for example, a weather forecast system), and by predicting the system's behavior, computational models can simulate a system. Furthermore, it was argued that given some premises and appraisal operations, computational models of emotion can predict and potentially produce behavior [4]. According to Marsella et al. [5], computational models of emotions play various roles in research and applications: (1) from the perspective of psychological research, computational models better support our understanding of human emotional processes; (2) from the perspective of AI and robotics research, modeling of emotion can infuence the reasoning process or coordinate an agent or robot to better adapt to its environment; (3) from the perspective of HCI research, modeling of emotion improves the efciency and efectiveness in interaction, as well as enhancing user experience.
To investigate the importance of afective processes in social development and socially situated learning of robots coexisting with humans in the human environment, the computational models of emotions for socially interactive robots were introduced [6]. According to Breazeal et al. [7], to efectively engage in emotion-based interactions, robots must possess three kinds of capabilities: (1) to recognize and interpret human emotional signals, (2) to operate by means of their internal emotional models that are often based on theories in psychology, and (3) to communicate their affective states to others. Since the emotional responses of a robot can be determined by the robot's computational model of emotion which depends on the interaction of its internal cognitive-afective state with the external environment [7], these internal emotional models are crucial for human-robot interactions [8].
Many people with autism spectrum disorder (ASD) have characteristics such as difculty in social communication (e.g., poor perception of nonverbal cues including facial expressions and gestures in body language, as well as inappropriate expressions), limited and repeated behaviors, as well as narrow, focused interests (in Diagnostic and Statistical Manual of Mental Disorders 5th Edition: DSM 5 [9]). It is increasingly necessary to introduce social interactive robots as an auxiliary means for the treatment and rehabilitation of ASD, so as to improve the diversity of treatment and the efectiveness of rehabilitation training, and to mitigate the medical staf shortages in mainland China and the rest of the world [10]. A number of treatment and training targets, such as triadic interactions, joint attention (JA), turn-taking activities, improving eye contact and self-initiated interactions, assisting the diagnostic process, emotion recognition, and imitation, can be achieved by robotics for autism [11]. Moreover, robots have demonstrated their potentials in 24 of 74 ASD objectives in eight domains, including preschool skills for children with ASD, motor experiences and skills, social/interpersonal interactions and relations, emotional wellbeing, functioning in daily reality, sensory experiences, and coping, play, and communication [12].
However, better utilization of robots and HCI for autism intervention in the clinical setting does not necessarily lead to robots that are clinically more useful for ASD intervention [13]. Tis is partly due to the difculty in the ASD patients to understand the emotional and mental states of others, a feature of the autism spectrum conditions (ASC) [14]. ASC patients show symptoms of stunted development in their ability to recognize and diferentiate between diferent emotional expressions [15]. In addition, children with ASD may be focused on objects of interest for a very long period of time, failing to deliver rehabilitation training outcomes. Hence, if robots are able to follow the gaze, they may be deployed for human-robot interaction tasks, including rehabilitation training for autism [16,17].
Consequently, when designing computational models of emotions for socially interactive robots, especially for robots for a special group of people such as autistic children, one should take into account the social and communicative characteristics of such a group of people. Tere are four world-leading research groups with pioneering work in promoting social robots as useful tools in autism therapy, including the Kerstin Dautenhahn Group [18][19][20]), the Ayanna Howard Group [21,22], the Maja Matarić Group [23][24][25], and the Bram Vanderborght Group [26,27]. However, none of the 4 research groups have designed or applied computational models of emotions for the social robots used in their autism therapy studies. Terefore, this article will propose a novel computational model of emotions that are suitable for and can implement in socially interactive robots, especially for robots adopted in autistic rehabilitation.
Te contributions of this article are threefold. First and most importantly, this article presents a novel computational model of emotions so-called AppraisalCloudPCT that is suitable for socially interactive robots that may be used in autistic rehabilitation which, to the best of our knowledge, is the frst computational model of emotions built for robots that can satisfy the needs of a special group of people such as autistic children. Second, such a computational model of emotions can enhance human-robot interaction more interactively and efectively, as it takes into account a user's intention and attention and can coordinate the robot to make an appropriate response to the surrounding emotional environment. Tird, such a computational model of emotions can achieve a high degree of simulation of human emotions and can be computationally implementable in various robots, as our proposed computational model of emotion is based on the appraisal theories on emotions [28][29][30][31], perceptual control theory on emotions [32], a compositional view of model building [5], and cloud robotics [33,34]/cloud medical robots [35][36][37][38].
Te rest of the paper is organized as follows. Section 2 revisits some fundamental and notable computational models of emotions that have a deep and profound infuence on building some signifcant computational models of emotions for socially interactive robots, which will be reviewed in Section 3. Section 4 presents our proposed 2 Computational Intelligence and Neuroscience computational model of robotic emotions so-called AppraisalCloudPCT, and its implementation in a social robot for autistic rehabilitation will be elaborated in Section 5. Finally, the conclusions, limitations, discussion, and future work are given in Section 6.

Classical Computational Models of Emotions
Te development of computational modeling of emotion and cognition has been accelerated by recent human cognitive and psychological studies related to emotion [1]. For example, according to Marsella et al. [5], concepts drawn from AI have been cast in the appraisal theory of several computational models, including the belief-desire-intention (BDI) model, fuzzy logic, knowledge representation, Qlearning, planning, neural networks, and decision-making. Marsella et al. [5] used a fgure of a "family tree" of a number of the theoretical traditions and signifcant models (e.g., rational theories, anatomical, dimensional, and appraisal) to illustrate from which they stem. Instead of using a "family tree," Lin et al. [1] used two tables to review the fundamental theoretical traditions of emotion and cognition and efects modeled by some well-known computational models.
As this article will not focus on the interaction between emotion and cognition as [1] did, cognition theories such as the BDI model will not be reviewed here. Rather, appraisal theories such as OCC (the afect-derivation model proposed by Ortony et al. [39]) and Scherer's appraisal theory [40], as well as dimensional theories of emotion such as PAD [41], will be revisited here as classical theoretical traditions listed in [5]. Other theories, such as perceptual control theory on emotions [32] and a compositional view of model building [5], which were not listed in the "family tree" in [5], will be also revisited.

OCC.
Ortony et al. [39] proposed an appraisal theory, i.e., the OCC theory in their book "Te Cognitive Structure of Emotions," in which 22 emotions are categorized based on the appraisal of intensity (arousal) and pleasure/displeasure (valence). Te OCC theory ofers a structure for variables such as the familiarity of an object or the likelihood of an event, to determine the intensity of the emotion types. Based on what is being appraised, the OCC theory broke down the valence appraisal into three categories: praiseworthiness (of an action), like/dislike (of an entity), and desirability (of an event). In addition, when some branches are combined, well-being/attribution compound emotions (e.g., remorse and gratitude) concerning the consequences of events caused by an agent's actions will be formed.
Specifcations have three elements (i.e., the type specifcation stating the conditions that trigger an emotion of the type, a list of tokens, and a list of variables afecting intensity for each emotion type) that are given for each of the 22 emotion types. Te list of tokens specifes which emotional words can be classifed as belonging to the type of emotion discussed.
Five negative categories (hate, fear, distress, anger, and disappointment remorse) and fve positive categories (love, relief, hope, joy, gratitude, and pride) from the OCC model were proposed to use in Ortony [42], in order to decrease the complexity for the development of believable characters. However, for a character using facial expressions only, such ten emotional categories might still be too much, as argued by Bartneck [43], and he proposed to split the emotion process of the OCC model into fve phases. Additionally, to resolve the ambiguities identifed in the OCC model, a new view of the emotional logic structure of the OCC model based on inheritance was proposed by Steunebrink et al. [44].

Scherer's Appraisal Teory.
Appraisal theories of emotion, frst introduced by Arnold [45] and Lazarus [2,46], are rooted in Aristotle, Descartes, Spinoza, and Hume [47]. Ellsworth and Scherer and their students actively developed them [28,40,[48][49][50] in the early 1980s (see the historical reviews by Scherer [40,51]). Appraisal theories of emotion relate emotions to the more immediate cognitive assessment of coping capabilities, causal attribution, and evaluation of meaning [52], while the evolutionary theories of emotion relate emotions to biological adaption in the distant past by contrast. Clore and Ortony [53] treated appraisals to be the psychological representations of emotional signifcance for the person experiencing the emotion. And Scherer [51] reviewed a central tenet of appraisal theory and arrived at the conclusion that through some dimensions or criteria emotions are triggered and distinguished based on one's subjective evaluation of personal signifcance in events, objects, or situations.
Scherer [31] used stimulus evaluation checks (SECs) and defned in the component process model of emotion (CPM) [40,48,[54][55][56], to represent the minimum dimension or criteria set sufcient and necessary in distinguishing the essential families of emotional states. Te changes in the states of most if not all of the fve organismic subsystems will respond to the assessments of external or internal stimuli related to the organism's primary concerns, and such an episode of interrelated, synchronized changes is defned as emotion [40] in the framework of the CPM (see Figure 1 in [50]). According to CPM, emotion is considered to be a theoretical structure, consisting of fve components, each corresponding to one of the fve unique functions [50]. In the light of CPM, SECs are processed in sequence of a fxed order, containing four stages in the appraisal process each corresponding to one of the four appraisal objectives, i.e., relevance, implications, coping potential, and normative signifcance [47]. Moreover, CPM assumes that changes in the internal or external events keep maintaining a recursive appraisal process until the monitoring subsystem sends a signal to terminate or adjust the stimulation triggering the appraisal episode initially [40,50].
In summary, appraisal theories of emotion not only can be used to investigate the origin of emotion but can also be used to account for the emotions of people experiencing feelings, using the Geneva Emotion Wheel (see the second version in [57]), or the Geneva Expert System on Emotion (https://www.unige.ch/cisa/properemo/gep17/intro1.php). In addition, facial expressions and physiological processes Computational Intelligence and Neuroscience may change during the evaluation or appraisal of the personal signifcance of a certain object or situation, but which discrete emotion is experienced can be determined by the specifc profle of appraisal (i.e., the antecedent of the emotion), according to Niedenthal et al. [52]. As a result, two individuals can experience diferent emotions despite being subjected to the same event or stimulus, which is consistent with the appraisal theories of emotion.

PAD.
According to dimensional theories of emotion, emotion and other afective phenomena should be classifed and labeled in the way of the social construction-as points in continuous (usually two-or three-dimensional) space but not as discrete entities [41,[58][59][60]. Te historical development of dimensional theories of emotion can be traced back to James [61], Schachter and Singer [62], Russell [58], and Barrett [59]. Russell [63] suggested replacing discrete emotions with core afect due to cross-cultural diferences which attribute specifc emotions to facial expressions. Scarantino [64] described the "core afect" as follows: "Core afect, understood as the category comprising the set of all possible valence and arousal combinations on the circumplex, difers from discrete emotions in three crucial ways: it is ubiquitous, it is objectless, and it is primitive." ( [64], p. 948).
According to Russell ([58], p. 154), a person is in exactly one afective state at any time and such possible core affective states can be characterized in the space of continuous and broad dimensions. Mehrabian and Russell's "PAD" model [41] consists of three dimensions corresponding to pleasure (measuring valence), arousal (to measure the level of afective activation), and dominance (a measure of control or power), respectively. Many computational models of emotion were inspired by the PAD model, such as WASABI [65], a PAD-based model of core afects incorporating Scherer's sequential-checking theory.

Perceptual Control Teory on Emotions.
Perceptual control theory (PCT) [66] is a theory on how living organisms can control their inputs instead of their outputs. Te idea of PCT can be attributed to [67]: "What we have is a circuit, not an arc or broken segment of a circle. Tis circuit is more truly termed organic than refex because the motor response determines the stimulus, just as truly as sensory stimulus determines the movement ( [67]; p. 363)." "PCT was developed by William T. Powers, a physicist/engineer, in the 1950s. He frst published it in [68], then formalized it in [66], and revised it in his latest work [32]. According to PCT, through some principles, behavior is defned as (merely) the control of perception: (1) negative feedback leads to control; (2) a specifc hierarchical organization of loops leads to control; (3) perception can be only controlled by individuals themselves; (4) conficts can be caused by controlling others; (5) "dysfunction" can be caused by conficts between highlevel control systems; (6) a specifc learning mechanism helps reorganization reestablishes control." Moreover, PCT states that control systems are organized in a hierarchy to manage complex goals such as controlling low-level motor as well as regulating high-level psychological and social behavior, by defning the reference signal for the layer below in each layer [66]. Te levels of hierarchical perceptual control theory hypothesized by Powers are, respectively, 1st-order: intensity; 2nd order: sensation/ vector; 3rd-order: confguration; 4th-order: transitions; 5thorder: sequence; 6th-order: relationships; 7th-order: program; 8th-order: principles; 9th-order: system concepts.
In addition, Powers explained how to generate emotions through a PCT model in his paper [69]: (1) as the brain regulates, the neurochemical reference signals sent from the hypothalamus through the pituitary gland to all major organ systems, and emotion is defned as a product of brain activity; (2) as perceivable changes of physiological state result from disturbances calling control systems into action, emotion is a direct response to the disturbance, the presence of which can be known of instantly by one's conscious awareness; (3) in closed-loop terms, an experienced emotion is caused by "feelings" which is a collection of inputs and perceptions; meanwhile, it outputs a change in the physiological state (e.g., vasoconstriction, respiration rate, metabolism, heart rate, and motor preparedness); (4) an emotion is caused to happen by a reference signal in some high-level system specifying more or less intended amount of some perception, but not by the external factors; (5) in a high-level control system, a zero error signal results from that the perceived current state matches the specifed reference signal; while the mismatch will cause a nonzero error signal, so action needs to be taken to correct the error causing emotion; (6) emotional behavior and emotional thinking can be caused by an error signal immediately resulting from a change of reference signal or a change of a disturbance; (7) the strongest negative emotions are related to the largest errors and errors that human beings think need to be corrected most, and when some internal or external factors prevent us from taking action to correct errors, their maximum intensity and duration will appear; (8) when the degree of error is signifcant and important to them, human beings will use emotional words, leading to awareness of the cause, while small errors mean not using emotional words, leading to failure to identify the cause.
To summarize, emotions are defned to be one aspect of the wholly integrated hierarchy of control by PCT on emotions. Te PCT on emotions involves the notion of "an embodiment" (e.g., emotion is defned as a product of brain activity), "adaptation" (e.g., the "general adaptation syndrome" in the case of attack behavior or avoidance), and "appraisals" (e.g., evaluating the signifcance of an error signal). Consequently, PCT on emotions is compatible with other theories such as the theory of embodied emotion, evolutionary theories, and cognitive-appraisal theories to some extent.

A General Architecture of Computational Models of
Emotion. Marsella et al. [5] argued that a number of component "submodels" integrated into the computational models listed in the "family tree" are not clearly delineated. Tey proposed that by disassembling "submodels" along appropriate joints, a large number of signifcant diferences between diferent computational models of emotion can be decomposed into a few design choices.
Tey then proposed a component model view of appraisal models conceptualizing emotions as a set of linked component models (see Figure 2 in [5]) and the relationships between these components. Terminology associated with each of the component models listed in the appraisal architecture was also introduced: (1) personenvironment relationship: the term refers to some expression of the relationship between the agent and its environment, which was introduced by Lazarus [2]; (2) appraisal-derivation model: such a model converts some representations of the relationship between a person and the environment into a set of appraisal variables; (3) appraisal variables: they are a set of specifc judgments generated as a result of an appraisal-derivation model, which can be used by an agent to produce diferent emotional responses; (4) afect-derivation model: the mapping from appraisal variables to afective state is processed in this model, and once a pattern of appraisals has been determined, then accordingly how an individual will react emotionally is also specifed in this model; (5) afect-intensity model: in the model, a specifc appraisal will result in the strength of the emotional response, which is usually calculated by an intensity equation using a subset of appraisal variables, such as desirability and likelihood; (6) emotion/afect: afect could be a set of discrete emotions, a discrete emotion label, core afect in a continuous dimensional space, or even a combination of these factors; (7) afect-consequent model: this model maps afect (or its antecedents) onto some behavioral or cognitive changes which are determined by the behavior consequent models and cognitive consequent models, respectively. Behavior consequent models summarize how afect (e.g., emotion, feeling, and mood) alters an agent's observable physical behavior such as facial expressions, while cognitive consequent models determine how afect will change the nature or content of cognitive processes such as an agent's beliefs, desires, and intentions, respectively.
Tree rather diferent systems, i.e., EMA [70], ALMA [71], and FLAME [72], were characterized in [5] to highlight the conceptual similarities and diferences between emotion models by using a component model view of appraisal models. Marsella et al. [5] argued that the adoption of a component view of the model building can empirically assess the capabilities or validity of alternative algorithms to implement the model and conduct meaningful comparisons (i.e., similarities and diferences) between systems.
To sum up, Marsella et al.'s compositional view of model building [5], which lays stress on that emotional models, is often composed of individual "submodels" or "smaller components" that can be matched, mixed, or excluded from any given implementation and is often shared. According to Marsella et al. [5], components may be evaluated and subsequently abandoned or improved due to ongoing evaluations before the fnal version of the model is designed.

Kismet's Cognitive-Afective Architecture.
With four perceptual modalities (facial display, body posture, gaze control, and speech), an expressive robot called Kismet [73] was developed by MIT, to explore the nature of social interaction and communication between humans. In other words, insights from psychology and ethology [8] have inspired the extensive computational modeling, to explore the social interaction between caregiver and infant.
In view of the key role of infants in normal social development, in order to implement core primitive social response shown by infants, a cognitive-afective architecture emphasizing interactive and parallel systems of cognition and emotion [6] was designed for Kismet. Te architecture (see Figure 58.6 in [8]) mainly contains two parts, one is the cognitive systems which are responsible for drives, attention, perception, and goal arbitration while the other part is the afective processes that include afective appraising incoming events, expressive motor behavior (facial expressions, vocalizations, etc.), and basic emotive responses. Terefore, Kismet's models of emotion interact closely with its cognitive system, afecting the behavior and goal arbitration in the architecture [7].
By combining the basis facial postures, Kismet produces a continuous range of expressions (i.e., fve primary emotions (happiness, fear, disgust, sadness, and anger) and three additional ones (excitement, interest, and surprise) of varying intensities. Tis is achieved through the application of an interpolation-based technique in a threedimensional, componential afect space consisting of the valence, arousal, and stance axes [74], adapted from Russell's circumplex model (arousal and valence) [75], and resonated well with the work of Smith and Scott [76]. Breazeal [74] enumerated a number of advantages gaining from this afect space, such as making the reception of robot facial expressions clearer since only a single state can be expressed at a time (according to selection), enabling refecting the nuances of the underlying assessment of the robot's facial expressions, and facilitating smooth trajectories through the afect space.
Te importance of building an emotional space that allows smooth transitions between discrete emotions was emphasized by the Kismet project, although it does not compare the believability of the expression of smooth transitions and nonsmooth transitions. Moreover, the Kismet project shows that by using a computational model of emotion, a robot can conduct social interaction with humans apart from arbitrating its internal afective states [77,78].

WE-4RII's Mental Model.
Te core of the mental model of a robot called WE-4RII (see Figure 58.10 in [8]) is the emotion model. Te dynamics of mental transitions in the WE-4RII mental model can be expressed by equations adopting the equation of motion that describes the movement of objects in dynamics [8].

Computational Intelligence and Neuroscience
To express the dynamics of mental transitions, the WE-4RII robot has implemented equations of emotion, mood vector, and equations of need (see [79] for more details). Furthermore, the seven basic emotions defned by Ekman [80] are represented as the emotion vector [79,81] in a three-dimensional mental space consisting of the pleasantness, activation, and certainty axes. Seven emotions and the expressions corresponding to these seven emotions are mapped into a 3-D mental space, and the regional mapping of WE-4RII's emotions is determined by the emotion vector E passing through each region (see Figure 58.11 in [8]).
In summary, the mental model of WE-4RII can be computationally implemented [82], as it implements equations inspired by motion to express the dynamics of mental transitions.

PCT-Based Model PRESENCE.
To generate robotic emotional behavior, some researchers have designed some computational structures based on PCT on emotions. For instance, a model called PRESENCE "PREdictive SENsorimtor Control and Emulation" which is based on PCT was developed by Moore [83] to improve the speech-based human-machine interaction. Due to PRESENCE, a system can cater to the needs and attention of a user, while a user can allow for the needs and intentions of the system. According to Moore [83], cooperative and communication behaviors are by-products of recursive hierarchical feedback control structures based on this ensemble model. Some theories and ideas in domains, such as control, neuroscience, bioscience, and psychology, have laid a foundation for the creation of PRESENCE. Tese theories and ideas include "perceptual control theory," [66] "mirror neurons," [84] "hierarchical temporal memory," [85] and "emulation mechanisms." [86] To solve three fundamental constraints (i.e., energy, entropy, and time) that ultimately determine the organism's ability to survive within an evolutionary framework, PRESENCE was originally designed as an integrated and recursive processing architecture. To facilitate efcient behavior and efcient communications, PRESENCE maximizes the achievements of the system or the user in the interactive environment, and it is organized into four layers and is therefore inherently recursively nested and therefore hierarchical in structure.
Te PRESENCE has been demonstrated in [83] that a Lego NXT computer model was built by Moore to maximize the synchronization of its own behavior with external sources. Te robot can sense external sources such as external sounds, can sense its own sounds, and can generate its own rhythmic behavior. Moore's research shows that PCT not only can be used for explaining emotional behavior but also be used in the prediction of emotional behavior.

iGrace Computational Model of Emotions.
Te iGrace computational model (see Figure 1 in [87] for more details) was designed to enable a companion robot EmI to have a nonverbal emotional response to the speaker's speech. Te iGrace consists of 3 principal parts, i.e., the "input" module, the "emotional interaction" module, and the "expression of emotions" module, which can enable EmI to receive input information, process them, and determine emotional behavior. Saint-Aimé et al. [87] described these three modules as follows.
Te 7 uplets of the understanding module (i.e., the act of language, actions "for the child," concepts "for the child," tense, coherence, phase, and emotional state), the audio signal, and the video signal are taken into account in the "input" module. As such, this module can represent the interface for data exchange and communication between the emotional interaction module and the understanding module.
With the "emotional interaction" module, iGrace can generate the emotional state of EmI using discourse information given by "input" as well as its internal cognitive state. Tis module contains 4 submodules, namely moderator, selector of emotional experience, generator of emotional experience, and behavior (see more details in [88,89]), which produce lists Li of pairs (eemo, C (eemo)) involving in four steps (see Figure 2 in [87]) in which C (eemo) denotes an infuence coefcient and eemo denotes an emotional experience.
In the "expression of emotions" module, a list of triplet < tone, posture, facial state> is built to express the emotional state of EmI, in which tone is converted into music notes and postures, and facial expressions of EmI are converted into motor movements.
To sum up, the iGrace computational model has demonstrated that it can be computationally implemented in companion robots such as EmI and the new version of EmI [87]. Tis might result from that iGrace is an instance of the generic model of emotions GRACE [90]. Furthermore, as compared to other computational models of emotions, such as FLAME [91], Kismet [7], Greta [92], EMA [70], and GALAAD [93], GRACE is the only model that applies the three fundamental theories that characterize an emotional process, namely the appraisal theory, coping theory, and personality theory, according to Saint-Aimé et al. [88].

A Computational System of Emotion xEmotion.
To allow an agent (a robot carrier) to respond most appropriately to specifc changes in the environment, xEmotion, a computational system of emotion, is designed. According to Kowalczuk et al. [94], implementing the intelligent system of decision-making (ISD) in an autonomous agent or robot can make it operate faster and more efciently, resulting from the ISD's system of emotions, which can be viewed as an approach based on scheduling variable policies from a control theory perspective. Covering various psychological theories on emotions such as the somatic, evolutionary, and appraisal theories of emotion, xEmotion takes into account specifc temporal divisions of emotion and, in particular, considers both long-term changes (e.g., personality changes or emotional disorder) and short-term emotions (e.g., expressions or autonomous changes). Furthermore, xEmotion uses (common/real and private/imaginary/individual) wheels/circles of emotion or the "rainbows" of emotions [95] to interpret and compile emotions. 6 Computational Intelligence and Neuroscience Kowalczuk et al. use a general scheme (see Figure 3 in [94]) to explain how emotions are used as a scheduling variable in the xEmotion system. It takes approximately fve big steps to generate emotions in the scheme, namely impression recognition, discoveries recognition, generating emotion/generating equalia (these two phases are parallel), generating mood, and available reactions. And 6 principal components of xEmotion, i.e., autonomous preemotions, expressive subemotions, expressive subequalia, classic emotion, equalia, or private emotion, and mood are distinguished in [94,96,97].
For xEmotion to be computationally implementable in the agent (robotic carrier), Kowalczuk et al. [94] applied fuzzy sets in six principal components of xEmotion in the three phases of an emotion process, namely somatic emotions (or preemotions), appraisal of emotions (including subemotions and emotion), and personal emotions (including subequalia, equalia, and mood). Te emotional components and their underlying relationships can be found in [98].
In summary, emotions in xEmotion are used not only as scheduling variables (for decision-making and forming responses or general behavior) but also as adjustment parameters (in the motivation subsystem). Furthermore, interpreting and using emotions as a scheduling control variable have made some contributions to the research and the implementation of the computational model of emotions for robots.

Our Proposed Computational Model of Robotic Emotions
In this section, we frst propose a computational model of emotions for socially interactive robots, especially for robots for a special group of people such as autistic children, socalled AppraisalCloudPCT (based on a component view of computational models, the appraisal theories on emotions, cloud robotics, and perceptual control theory on emotions), then we compare our model AppraisalCloudPCT with the fve models for robotic emotions revisited in Section 3. a computational model of emotions should not only take into account a user's intention (or need), attention, emotional state, the response to the robot, and the impact of the external environment (such as noise, disturbance, contextual cues) on the user during the interaction but also coordinate the robot to make an appropriate response to the surrounding emotional environment (4) Principle in promoting the universality of a computational model in socially interactive robots: as more and more socially interactive robots are deployed in therapy and rehabilitation situations, a computational model of emotions should take into account the social and communicative characteristics of a special group of users such as autistic children or dementia elders (5) Principle in facilitating sharing information between and learning from socially interactive robots: a computational model of emotions should endow a robot with a more powerful capability of making decisions faster, more appropriate, and more efcient, given that more and more socially interactive robots will be exposed to various users with diferent backgrounds and be connected to substantial Internet of Tings (IoT) such as medical IoT with massive medical data

An Overview of the New Model Appraisal Cloud PCT.
Based on the fve primary principles in designing a model mentioned previously, we designed a new computational model of robotic emotions AppraisalCloudPCTas illustrated in Figure 1. Te theoretical basis and guiding methodology covered in the proposed model in response to each of the 5 primary principles are introduced as follows: (1) Te proposed computational model adopts the concepts of perceptual control theory (PCT) on emotions [32] and PCT-based PRESENCE [83] to achieve simulation of human emotions: in a closedloop as illustrated in Figure 1, a collection of the intention of a robot (i.e., a reference signal) and achievement (perceived outcome) (i.e., a perceptual signal) will cause an experienced emotion, and at the same time, an output-caused change in the cognitive states and behavior of the robot will afect a user's behavior during the human-robot interaction. In other words, the diference (i.e., a mismatch) between the reference signal and the perceptual signal will immediately result in an error signal, which will give rise both to the emotional behavior and to the emotional thinking of a robot. And emotions with greater intensity and longer duration will arise in connection with a larger error that demands a robot Computational Intelligence and Neuroscience to alter its afect-consequent model more appropriately to correct the error. Moreover, with the computational model, a robot will be endowed with mood and cognitive states, personality, and cloudbased interaction strategies to form its intention. As such, the computational model can highly simulate the whole process (e.g., intention, generation, regulation, and responding to a stimulus) of a human emotion. (2) Te proposed computational model adopts Marsella et al.'s compositional view of model building [5], which lays stress on that emotional models are often composed of individual "submodels" or "smaller components" that can be matched, mixed, or excluded from any given implementation and are often shared. And 5 out of 7 component models listed in the appraisal architecture in [5] are adopted in our proposed computational model, namely appraisal variables, afect-derivation model, afect-intensity model, emotion/afect, afect-consequent model consisting of the cognitive-consequent model, and behavior-consequent model. As illustrated in Figure 1, our proposed model is assembled from more than 15 "submodels." Consequently, when each of them is computable, the computability of our proposed model as a whole can be achieved. (3) On one hand, to make human-robot interaction more efective, efcient, or pleasant, the achievement (perceived outcome), e.g., the interpretation of a user's intention, attention, emotional state, and behavior, will infuence the appraisal variables in the proposed computational model. On the other hand, to coordinate a robot to respond to the surrounding emotional contexts (i.e., contextual cues in the environment containing emotional information that might have an impact on a user's interpretation of the behavior of a robot [99][100][101][102]) appropriately, so that the robot can ft with its environment better, contextual understanding of the scenarios and the user is taken into account to support the cloud-based interaction strategies in the proposed computational model.
(4) Te proposed computational model of emotions takes into account the social and communicative characteristics of a special group of users such as autistic children or dementia elders, through its submodel so-called cloud-based interaction strategies, which is supported by two submodels (i.e., "contextual understanding of scenarios and users" and "cloud-based evaluation system") in a cloud medical robot platform, as illustrated in Figure 1. A cloud-based evaluation system may have certain advantages as mentioned in the research on cloud medical robots [35][36][37][38], one of which is data of interaction between a user and a robot can be stored and evaluated in a cloud for further assessment of the social and communicative characteristics of a user. And the submodel "contextual understanding of scenarios and users" relies on another two submodels "local pattern recognition" and "cloud-based pattern recognition," which can provide the interpretation of a user's intention, attention, emotional state, and behavior. Terefore, the proposed computational model of emotions is suitable for socially interactive robots, especially for robots for a special group of users such as autistic children or dementia elders, which promotes the universality of our model to some extent.   Computational Intelligence and Neuroscience (5) To facilitate sharing information between and learning from socially interactive robots, a cloud medical robot platform is built and assembled in the proposed computational model. With such a platform, information can be shared between robots through the submodel "cloud-based evaluation system," and the capability of interpretation of a user and of making decisions can be learned through the submodel "contextual understanding of scenarios and users."

Comparison of Models.
Tis section compares the 5 computational models for robotic emotions revisited in Section 3 with our proposed computational model (see Table 1 for a summary). Te fve crucial properties of a computational emotion model, (i) domain-independent, (ii) models mood, (iii) models personality, (iv) data-driven mapping, and (v) ethical reasoning, as listed in a review paper [103], alongside with one more property (vi) combining with cloud robotics (we believe this will be a future trend in building the computational models of emotions for socially interactive robots), are chosen as the six criteria for comparison. Table 1 shows a comparative assessment between the computational models of emotions for socially interactive robots as can be inferred from the summary in the table, even to satisfy the frst fve criteria still remains as a challenge. Great eforts have been made in building our proposed computational model of emotions to meet all the six criteria, by adopting the appraisal theories on emotions, perceptual control theory on emotions, a component model view of appraisal models, and cloud robotics. How our proposed computational model meet all the six criteria is summarized as follows: (1) to meet of the criteria of "domain-independent," our proposed computational model not only takes into account the social and communicative characteristics of every user but also can coordinate a robot implementing our model to respond to the surrounding emotional contexts appropriately; (2) mood is considered as a long-term change in a submodel "mood and cognitive states" of our proposed computational model, and it is impacted by the other two submodels "emotion/afect" and "cognitive," and therefore, the second criteria "models mood" can be met; (3) there is a submodel "personality" in our proposed computational model such that personality can be modeled; (4) between appraisal variables and emotions, there are two consecutive submodels "afectderivation model" and "afect-intensity model" in our proposed computational model, which supports data-driven mapping of the appraisal variables into emotion intensities; (5) a emotion regulation mechanism is implemented in our proposed computational model through a closed-loop emotion modeling and regulation based on perceptual control theory on emotions, and through a submodel "cloud-based interaction strategies," (6) our proposed computational model combines with cloud robotics by using a submodel "cloud medical robot platform."

The Implementation of Our Model in a Social
Robot for Autistic Rehabilitation

A Social Robot for Autistic Rehabilitation.
We developed a socially interactive robot so-called Dabao for autistic rehabilitation, with which we conducted three preliminary clinical human-robot interaction studies [10,104,105] for Chinese children with ASD. Te appearance and functionalities of the robot are demonstrated in Figure 2, and the software architecture is illustrated in Figure 3 as follows.
Apart from the tactile sensing [106] and some APP instances [105,107,108] on the touch screen as demonstrated in Figure 2, we have developed some other deep learning algorithms to endow the robot with a stronger capability in the interpretation of a user (e.g., an autistic child), such as intention understanding (see Figure 4) and attention recognition (see Figure 5). Furthermore, Table 2 summarizes six major capabilities of the robot to perceive a user, to infer a user's mood and cognitive states and behavior, and to express itself to a user that can infuence the efect, the efciency, and the pleasantness in the humanrobot interaction.

Te Implementation of Our Model in the Social Robot.
As illustrated in Figure 6 (as equivalent to Figure 1, except for all of the submodels are marked in diferent numeric symbols and diferent color themes, for a better explanation of how our model is implemented in the social robot Dabao developed by us), our proposed model AppraisalCloudPCT consists of 20 compositional submodels (or so-called components of a model). Such a compositional view of the model building has certain advantages, one of which is that we can implement the proposed model AppraisalCloudPCT in our social robot by implementing its compositional submodels one by one and then by forming the whole model in a closed-loop control.
We implement each submodel with mathematical definitions and formulas in our social robot as follows.

Te 1 st Submodel "Mood and Cognitive States".
Te equation of mood, equation of emotion, as defned in [112], will be adopted in implementing our proposed model AppraisalCloudPCT. First, emotion vector E can be defned in the PAD mental space consisting of the pleasantness, arousal, and dominance axes as the robot's cognitive state as follows: where M p and M a denote the pleasantness and arousal components of the mood, respectively. Te integral of the pleasantness component of the emotion equation (3) defnes M p , resulting from that the pleasantness of mood can be infuenced by the current cognitive state. Furthermore, M a has been defned by the Van del Pol equation (4) owing to that the activation component of mood vector is similar to the biological rhythm of the human body, such as the internal clock.

Te 2nd
Submodel "Personality". By far, the big fve personality traits (i.e., openness (O), conscientiousness (C), extraversion (E), agreeableness (A), and neuroticism (N)), as defned in [113,114]), were the most widely used measure for human and robot personality modeling in human-robot interaction literature. Tree main conclusions can be drawn from the literature review in [115]: (1) extroverts seemingly react more positively in the period of interaction with robots; (2) humans respond more positively to extroverted robots, but this relationship is moderate; (3) humans respond well to robots with similar and/or diferent personalities. Furthermore, Robert [115] suggested the efects of context on the impact of robot and human personality to be looked at in future studies, as it is easy speculating that the personality of a robot may be more important to a home robot rather than one used at work. Tis is consistent with the contextual approach to personality, whereby a person's personality is (1) A model that satisfes the given property is marked with a tick mark (√); a model that does not satisfy the given property is marked with a cross mark (×), and when we were unable to retrieve enough information to determine whether a specifc property was met, we use a question mark (?). (2) According to Ojha et al. [103], "domain-independent" means processing and exhibiting emotional responses in various situations but not only in certain kinds of interaction domain; "models mood" means integrating the notion of mood with emotions; "models personality" means integrating the notion of personality; "data-driven mapping" is defned as a data-driven mapping of the appraisal variables into emotion intensities according to the learned relationship between emotions and appraisal variables; as for "ethical reasoning," it is defned to be an emotion regulation mechanism implemented based on ethical reasoning for the emotional and behavioral responses of social robots to be more "acceptable" in the human community.

APP (instances) on the touch screen
Chinese Chatbot mentioned in [107] Image captioning mentioned in [108] FETCS mentioned in [105] Gesture recognition and imitating others LED display of facial expressions (e.g., happiness, anger, sadness) 21 tactile sensor arrays are assembled on the inner surface of the ABS shell of the robot mentioned in [106] Realsense D435 Camera

NVIDIA Jetson TX2
Algorithms and deep learning run on a TX2 and a cloud platform     Figure 4: A new task-based framework that enables robots to understand human intentions using visual-NLP semantic information [109]: it includes a language semantics module to extract keywords no matter if the command directive is explicit or not, a visual object recognition module to identify multiple objects located to the front of the robot, and a similarity computation algorithm for inferring the intention based on a given task (i.e., selecting some desired item out of multiple objects on a table and giving it to a particular user among several human participants). Result of the similarity computation is then translated into structured robot control language RCL (grasp object to place) to be comprehended by robots. Te experimental results demonstrate the ability of the framework to allow robots to grasp objects with the actual intent of vague, feeling, and clear type instructions.
best described and understood in the various contexts in which it is placed [116]. Moreover, the users' preferences for robot personalities can be determined by people's stereotype perceptions of certain jobs and the background of the robot's role [117]. Terefore, the behavior of the robot may need to be adapted to the user's expectations as to what personality and behavior are consistent with such tasks or roles.
In a recent study [118], researchers found that participants performed better when using a robotic assistant with a similar personality to their own or a human assistant with a diferent personality. Tis is in accordance with the results of the systematic evaluation of human and robot personality in healthcare human-robot interaction [119] that matching the patient and robot personality based on introversion or extroversion is positively correlated with benefcial results. Research in [119] also found that robot personality traits such as extroverted, feminine, responsive, amiability, and sociable were positively associated with benefcial outcomes.
Not only the emotional factors [120] but also the appraisal patterns of emotion [121] can be afected by the Big Five personality traits. Te relationship between the PAD model [41] and the fve factors of personality can be derived Te kid is looking at a banana.
General Image Caption Tere is a banana on the table, can you see that?
Gaze-based Image Caption (c) Figure 5: Te overall framework of a novel gaze-based image caption system for autistic children and the efect of the framework in a gazebased image caption system [110]: (a) the overall framework describes the region where an autistic child is looking at and combines image caption (based on attention heat maps, it describes the region concentrated by the child) with gaze-following (it is based on spatial geometry and predicts areas of attention from the spatial relationship between the map and line of sight); (c) is more suitable than (b) in enhancing human-robot interaction and in promoting the spontaneous language development of autistic children, as adding gaze-following can support a robot in better describing what the child is looking at. Te robot extracts information from one of the three types (i.e., vague, feeling, and clear) of instructions from a child and identifes the objects in front of the robot Te robot infers the intention of the child by a similarity computation algorithm and then transforms it into a structured robot control language Turning itself to face the child, the robot points out a target object with a verbal description of the intention of the child through the linear regression analysis in [120]. And three equations of temperament including pleasure, arousal, and dominance are summarized in [122] as follows: where Pα denotes the value for the pleasant axis (α-axis), Pβ denotes the value for the arousal axis (β-axis), and Pc denotes the value for the dominance axis (c-axis), respectively. Furthermore, the fve factors of personality, i.e., O, C, E, A, N ∈ [−1, 1], where O for openness, C for conscientiousness, E for extraversion, A for agreeableness, and N for neuroticism, respectively. Te relationships between the fve factors of personality and the appraisal dimensions of emotion could be derived in [121] (Page 519), where 10 main appraisal dimensions in major appraisal theories (Pleasantness, Goal Conduciveness, Efort, Perceived Control, Certainty, Agency-Self, Agency-Others, Agency-Circumstances, Unfairness, and Moral Violation), plus a new appraisal, relationship-involvement, were selected (see the Appendix in [121] for more details). Similarly, 9 personality-appraisal relationships (no relationship was found for appraisals "efort" and "relationship-involvement") in [121] (Page 519) can be summarized as follows: Goal − Condu civeness F gc � −0.579N + 0.369C, Perceived Control F pc � −1.281N + 0.923E + 1.306C,  (2) the submodels (i.e., number 1-3) marked in the yellow-brown color theme constitute a robot's intention of how to appraise an event (i.e., appraisal patterns of an interaction process), in which the appraisal patterns of the nine appraisal dimensions of a robot's emotion can be afected by a robot's mood and personality, and the interaction strategies; (3) the submodels (i.e., number [16][17][18][19][20] marked in the green color theme constitute the "cloud medical robot platform"; (4) the submodels (i.e., number 12-13) marked in the cyan color theme indicate that a robot will not only take the impact of the external environment (such as noise, disturbance, contextual cues) on the user during the interaction into account but also respond to the surrounding contexts appropriately.
14 Computational Intelligence and Neuroscience where O, C, E, A, N ∈ [−1, 1] Each equation indicates a relationship between an appraisal dimension and a combination of the Big Five personality traits, i.e., the tendency of appraising events in the particular appraisal dimension by people with specifc personality traits. For instance, in equations (6) and (7), people with low N and high C will be more likely to appraise events as pleasant (Pleasantness) and as conducive to important goals (Goal-Conduciveness), although the tendency of appraising the same event in the two appraisal dimensions is not exactly the same. Note that once the value of the Big Five personality traits is determined, the value of each appraisal dimension will be also determined in equations (6)-(14).

Te 3 rd Submodel "Cloud-Based Interaction Strategies".
Te main purpose of this submodel is to output a strategy that a robot can use in the next round of interaction with an autistic child. Adopting the perceptual control theory on emotions, our proposed model AppraisalCloudPCT is designed in the frst place to enable many rounds of recursive interaction between a robot and an autistic child, so that the interaction will be more efective, efcient, and easier to be satisfed by the child. By "strategy," it means that, given the specifc estimation of valence, arousal, and engagement levels of the child supported by the submodel "cloud-based evaluation system" and the contextual understanding of the interactive scenario and the child supported by the submodel "contextual understanding of scenarios and users," the robot will be able to alter its mood and personality to match with the status of the child and the interactive context, for a better round of interaction.
As mentioned above in 5.2.2, for a better performance in human-robot interaction, a robot should have a similar personality to human participants, and the efects of context should be taken into consideration when designing a robotic personality. In this study, it is, therefore, necessary for the robot to have knowledge of the personality profle of an autistic child (this can be supported by the 19th submodel "cloud-based evaluation system," as illustrated in Figure 7 that personality profle can be provided by the child's parents) and to understand the interactive scenario and the child in the surrounding context (this can be supported by the 18th submodel "contextual understanding of scenarios and users," please refer to it for more details).
Consequently, this submodel will output cloud-based interaction strategies as follows: Strategy one: To match a robot's personality with that of a child, frst, the personality profle (i.e., rating scales of O, C, E, A, N between −1 and 1) of an autistic child who will interact with the robot will be obtained, and then, a robot's personality will match with the child's personality. Once the personality of the robot is altered, the emotional tendency that the robot will be experiencing and the appraisal patterns that the robot will use can be predicted by using the (10) equations in 5.2.1 and 5.2.2. Strategy two: As contexts efect of a user's perception of not only the emotions but also the personality of a robot, efects of context should be taken into consideration. First, the role that the robot plays in the task of the HRI scenario and what kind of personality that an autistic child expects to be consistent with such a task or role should be identifed in the frst place. Ten, the personality of the robot should be modifed to adapt to the child's expectation. Strategy three: Te outcome of "contextual understanding of scenarios and users" should be taken into account, given that the noise and disturbance, and contextual cues may infuence an autistic child's mood and his/her judgement of the robot's emotions. To do that, frst, the emotional valence of the contextual cues will be obtained. Ten, the robot's mood should be congruent with the emotional valence of the contextual cues to some extent. Tirdly, in case of noise and disturbance were detected in the HRI scenario, the robot's estimation of the child's valence and arousal levels provided by the submodel "cloud-based evaluation system" should be rectifed to some extent depending on the amount of the noise and disturbance.

Te 4th
Submodel "Intention". "Intention" in this submodel means how will a robot intends to appraise an event (i.e., appraisal patterns of an interaction process), based on a robot's mood and personality with the consideration of an interaction strategy for the next round of interaction. Te main purpose of this submodel is to map the outputs of the frst three submodels, namely, "mood and cognitive states," "personality," "cloud-based interaction strategies," into appraisal patterns, which can be defned as follows: F intention � F pl + ∆pl, F gc + ∆gc, F pc + ∆pc, Fc + ∆c, F as + ∆as, F ao + ∆ao, F ac + ∆ac, Fu + ∆u, F mv + ∆mv , where F pl , F gc , F pc , F c , F as , F ao , F ac , Fu, F mv represent Pleasantness, Goal Conduciveness, Perceived Control, Certainty, Agency-Self, Agency-Others, Agency-Circumstances, Unfairness, and Moral Violation, respectively, as defned in Equation (6)-(14) in 5.2.2, and ∆pl, ∆gc, ∆pc, ∆c, ∆as, ∆ao, ∆ac, ∆u, ∆mv represent the impact of the two submodels "mood and cognitive states" and "cloudbased interaction strategies" on the tendency of appraising events in the particular appraisal dimension, respectively.

Te 5 th Submodel "Appraisal Variables".
As mentioned in 4.1.2, in a closed-loop as illustrated in Figure 1, a collection of intention of a robot (i.e., a reference signal) and achievement (perceived outcome) (i.e., a perceptual signal) will cause an experienced emotion, and at the same time, an output-caused change in the cognitive states and behavior of the robot will afect a user's behavior during the human-robot interaction. In other words, the diference (i.e., a mismatch) between the reference signal and the perceptual signal will immediately result in an error signal, which will give rise both to emotional behavior and thinking of a robot.
Appraisal variables are defned as the set of specifc judgments by which a robot can generate diferent emotional responses. Te main purpose of this submodel is to output the error signal (i.e., a mismatch between a collection of intention of the robot and the achievement (perceived outcome)) as appraisal variables. Here, the error signal can be defned as follows: where F intention represents a collection of intention of the robot as defned in equation (15) Figure 7: Tree layers of modifed PPA-net based on the work in [123]: (1) feature fusion is performed in the feature layer using features from three modalities (visual, audio, and tactile); (2) the context layer frstly uses behavioral scores of the child's verbal ability, motor, and mental, to augment the input features using the autistic rating scales such as CARS2 [124], ADOS-2 [125], and ADI-R [126], and then, the GPA-NET (group-level network) is trained and used to initialize the personalized PPA-net weights at the personality, gender, and individual level (using clone); (3) the third layer is the inference layer, in which the child-specifc estimation of valence, arousal, and engagement levels will be performed.
determined how a robot will react emotionally. According to Itoh et al. [112], emotion vector E � (E p , E a , E d ) can be expanded into the second-order diferential equation as shown in equation (17) as follows: where M, Γ, K, F EA represent the emotional inertia matrix, emotional viscosity matrix, emotional elasticity matrix, and emotional appraisal, respectively. And the emotional appraisal FEA stands for the total result of appraising the appraisal variables (i.e., the error signal Ferror). According to Itoh et al. [112], by changing the emotional coefcient matrixes, the robot can express diferent reactions to a same stimulus.

Te 7 th
Submodel "Afect-Intensity Model". Te strength of the emotional response resulting from a specifc appraisal is specifed in this submodel. As mentioned in 4.1.2, emotions with greater intensity and longer duration will arise in connection with a larger error that demands a robot to alter its afect-consequent model more appropriately to correct the error. Terefore, the bigger the error signal Ferror becomes, the greater the intensity of and with longer duration an emotion will be, and the stronger the emotional response will be.

Te 8th Submodel "Emotion/Afect".
For each discrete emotion the robot will be experiencing, emotion vector E � (E p , E a , E d ) can be mapped in the PAD mental space consisting of the pleasantness, arousal, and dominance axes.

Te 9 th Submodel "Cognitive-Consequent Model".
Tis submodel determines how afect alters the nature or content of cognitive processes such as a robot's beliefs, desires, and intentions, respectively. As mentioned above, an error signal (i.e., a mismatch between the intention of the robot and the achievement (perceived outcome)) will result in a robot's intention to correct the error. How strong will the intention to correct the error be depends on how big the error is. Furthermore, as the robot is experiencing an emotion, its mood will be efected to some extent.

Te 10 th Submodel "Behavior-Consequent Model".
Tis submodel summarizes how afect alters our robot's observable physical behavior such as facial expressions. As described in Table 2 in Section 5.1, our robot is equipped with 6 key capabilities in interactive scenarios with Chinese autistic children, and it can express itself to the children through facial expressions, gestures, speech, etc. In the interactive scenarios, the robot should alter its observable physical behavior in a manner, according to not only the emotion it is experiencing but also the three cloud-based interaction strategies described in 5.2.3.

Te 11 th Submodel "User Behavior during HRI".
In the child-robot interaction scenarios (e.g., having a conversation, hugging, playing games), an autistic child will generate certain behavior to adopt to/fnish/withdraw from the child-robot interaction. Such behavior (e.g., gaze regulation, facial expressions, hand and body gestures, verbal expression), not only is a product of the child-robot interaction but also can be efected by the outward behavior of the robot as defned in the submodel "behavior-consequent model."

Te 12 th Submodel "Noise and Disturbance".
Noise in this submodel is defned as noise coming from the surrounding contexts (e.g., ambient noise, other human voices other than the voice of the autistic child during the child-robot conversation). And disturbance is defned as any unexpected event that will have an adverse impact on the child-robot conversation, such as a heavy push on the robot, and the autistic child is forced by somebody to end the childrobot interaction in advance. Both the noise and disturbance can be detected by the sensors to perceive the child and the environment and by the self-checking sensors (e.g., torque sensors) inside the robot.

Te 13 th Submodel "Contextual Cues".
Robot faces can be viewed in the same way as human faces, according to [102] that users' perceptions of a robot's simulated emotional expressions can be afected by diferent emotional surrounding contexts (i.e., consistent or inconsistent classical music, or BBC news). Furthermore, when there is emotional context around, people are more able to recognize the facial expressions of the robot when the emotional valence of the environment is consistent with the facial expressions of the robot than when the emotional valence of the environment and its facial expressions are not consistent [99][100][101]. Consequently, it is important for the robot to perceive the emotional valence (i.e., contextual cues) of the surrounding contexts (e.g., sound, music, pictures/posters on the wall, video clips on the TV) in the interactive scenarios. As such, contextual cues will be considered, collected, and added to our proposed model AppraisalCloudPCT in this submodel.

5.2.
14. Te 14 th Submodel "Multimodal Sensing". In this submodel, the robot will perceive the autistic child and sense the environment through various sensors (e.g., camera, microphone arrays, tactile sensing arrays, infrared sensor) and multiple modalities (e.g., visual, auditory, and tactile sensing). A collection of sensor data in this submodel will feed to two submodels "local pattern recognition" and "cloud-based pattern recognition" and will be uploaded to the cloud medical robot platform, more specifcally, to the submodel "cloud-based evaluation system."

Te 15 th Submodel "Local Pattern Recognition".
In this submodel, our proposed model AppraisalCloudPCT will output the results of the local pattern recognition (i.e., processes that run on the NVIDIA Jetson TX2 inside the robot body, as illustrated in Figure 3), and the results will uploaded to the cloud medical robot platform to facilitate the two submodels, i.e., the cloud-based evaluation system and the contextual understanding of scenarios and users.
Te output of tactile sensing is defned as TS � (P i , TB j ), where P i is the ith position of the robot body being touched by an autistic child P Te output of Attention Prediction (i.e., gaze and head direction estimation) is defned as AP � (dl, dr, dh), where dl and dr are gaze direction of the left and right eyes of an autistic child respectively, and parameter dh represents the head direction.
Te output of gestures recognition is defned as GR � (HG i , BG j ), where HG i is the hand gesture of an autistic child HG i ∈ {OK, Peace, Punch, Stop, Nothing}, and BG j is the body gesture of the autistic child BG j ∈ {Standing, Walking, Running, Jumping, Sitting, Squatting, Kicking, Punching, Waving, None}.

Te 16 th Submodel "Cloud-Based Pattern
Recognition". In this submodel, our proposed model AppraisalCloudPCT will output the results of the cloudbased pattern recognition (i.e., processes that run on the cloud, as illustrated in Figure 3), and the results will be uploaded to the cloud medical robot platform to facilitate the three submodels, i.e., the cloud-based evaluation system, the contextual understanding of scenarios and users, and the information sharing between and learning from robots.
Te output of image captioning is defned as IC � (Ob, Pr, At), where Ob represents the object concentrated by an autistic child, the region of which can be represented by an attention heat map of an image captured by the robot camera, Pr represents the preposition, and At represents the attributes of the object.
Te output of intention understanding is defned as IU � (Insi, TO, DP, RCL), where Insi is one of the three types of natural language instructions given by an autistic child Insi ∈ {Clear Type, Vague Type, Feeling Type}, TO represents the target object out of multiple objects in front of the robot, DP represents the delivery place that the target object should be delivered to, and the RCL format utilized in this paper is "Grasp TO to DP," which is the structured language that can be comprehended by robots.
In this submodel, we categorize the achievement (perceived outcome) into 5 types: ① OutcomeType1: "Friendly VS. Unfriendly" type, e.g., "Friendly" in the outcome of "Tactile Sensing" and "Gestures Recognition" means that the interpretation of the attitude of an autistic child towards the robot would be friendly, and an extreme friendly outcome, a neutral outcome, and an extreme friendly outcome of this type will be 1, 0, and −1, respectively; ② OutcomeType2: "Positive VS. Negative" type, e.g., "Positive" in the outcome of "Facial Expressions Recognition," "Natural Language Processing", and "Contextual Cues" means that, the emotional valence would be positive (e.g, output of a "dislike" in "Natural Language Processing" will be categorized as "Negative"), and an extreme positive outcome, a neutral outcome, and an extreme negative outcome of this type will be 1, 0, and −1 respectively; ③ OutcomeType3: "Valid VS. Invalid" type, e.g., "Valid" in the outcome of "Image Captioning" and "Intention Understanding" means that an autistic child will react positively after the robot verbally described the objects in the image or the robot verbally stated the intention in the interactive scenarios, and an extreme valid outcome, a no feedback outcome, and an extreme invalid outcome of this type will be 1, 0, and −1 respectively; ④ OutcomeType4: "Focused VS. Distracted" type, e.g., "Focused" in the outcome of "Attention Prediction" means that, during the human-robot interaction, the robot can predict that an autistic child has "focused" on one or two objects in the interactive scenario; on the contrary, "Distracted" means the gaze and head direction of the child cannot "fxed on" one or two objects, rather they shifted from one object to another object too often, and "None" means the child cannot "focused" on any object, and an extreme focused outcome, a none outcome, and an extreme distracted outcome of this type will be 1, 0, and −1 respectively; ⑤ OutcomeType5: "Normal VS. Unnormal" type, e.g., "Unnormal" in the outcome of "Action Recognition" and "Noise and Disturbance" means that the robot can detect some abnormal/unhealthy behavior (e.g., "walking on tiptoe" or "having back pain") of the child or some noise/disturbance in the interactive scenarios, and a normal outcome, and an extreme unnormal outcome of this type will be 0 and −1, respectively.
Note that the probability of simultaneous occurrence of most of or all of these types of outcomes is very low, and usually only a few of them will occur. For each kind of the pattern recognition (i.e., pattern recognition in submodels "local pattern recognition" and "cloud-based pattern recognition") and the sensing of the environment (i.e., the sensing in submodels "noise and disturbance" and "contextual cues"), as described in the above submodels, the outcome value of which will be mapped into [−1, 1] or [−1, 0] using fuzzy sets depends on which type of outcome is categorized as follows.
Similarly, PF gc , PF pc , PF c , PF as , PF ao , PF ac , PF u , PF mv can be defned as follows: PF ao � e 1 · OT 1 + e 2 · OT 2 + e 3 · OT 3 + e 4 · OT 4 + e 5 · OT 5 , 5.2.18. Te 18 th Submodel "Contextual Understanding of Scenarios and Users". As illustrated in Figure 6, the outcome of each kind of the pattern recognition and the sensing of environment will be uploaded to the "cloud medical robot platform," more specifcally, to this submodel and the next submodel "cloud-based evaluation system." As such, in this submodel, our proposed model AppraisalCloudPCT will output the outcome of "contextual understanding of scenarios and users," which is defned as CUSU � (US, UU), where US represents the understanding of scenarios provided mainly by the output of image captioning and of sensing the environment (i.e., combing scene description with the sensing of noise and disturbance, and of contextual cues), and UU represents the understanding of users provided mainly by the output of other local and cloud-based pattern recognition (i.e., gaze estimation, intention, gestures).

Te 19 th Submodel "Cloud-Based Evaluation System".
Te importance of this submodel is to provide insights into both the cognitive and behavioral status of an autistic child, and of the intention of the child to engage with the robot, to the submodel "cloud-based interaction strategies." In this submodel, a personalized machine learning (ML) framework, so-called the personalized perception of afect network (PPA-net) developed by an MIT research group [123], will be adopted in the "cloud-based evaluation system." As illustrated in Figure 7, in the modifed PPA-net, group-level perception of afect network (GPA-net) is trained with the data exacted from the autistic rating scales provided by the doctor or therapist of the child, and the data exacted from the personality profle of the child provided by the parents of the child. Consequently, by using the modifed PPA-net, this submodel can automatically provide a continuous and simultaneous estimation of levels of engagement and afective states (i.e., arousal and valence) of an autistic child, to the submodel "cloud-based interaction strategies."

Te 20 th Submodel "Information Sharing Between and
Learning from Robots". As mentioned earlier in chapter 4.1.2, one advantage in the research on cloud medical robots is data of interaction between a user and a robot can be stored and evaluated in a cloud for further assessment of the social and communicative characteristics of a user. With the cloud medical robot platform, in this submodel, information (e.g., the personality profle of each autistic child) can be shared between robots with the support of the submodel "cloud-based evaluation system," and the capability of interpretation of a user and of making decisions can be learned with the support of the submodel "contextual understanding of scenarios and users." 6. Conclusions, Discussion, and Future Work 6.1. Conclusions. In this article, we present a novel computational model of emotions so-called AppraisalCloudPCT for socially interactive robots, especially for robots for a special group of people such as autistic children. Tis model takes into account the social and communicative characteristics of autistic children so that it can ft the need of the autistic children. It mainly results from that our proposed model not only has solid theoretical ground built on a component view of computational models, the appraisal theories on emotions, cloud robotics, and perceptual control theory on emotions but also can be implemented in a social robot developed by us for autistic rehabilitation by adopting mood equation, emotion equation, and personality equation.
Moreover, compared to other signifcant computational models of emotions for socially interactive robots, our proposed model AppraisalCloudPCT has a number of merits. First, our proposed model can guarantee sufcient rounds of recursive interaction between a robot and an autistic child, so that the interaction will be more efective, efcient, and easier to be satisfed by the child. Second, with our proposed model, a robot can simulate the whole process of human emotion (e.g., generation, regulation, and responding to a stimulus) to a great extent. Tird, our proposed model can facilitate sharing information between and learning from various socially interactive robots. Last but not least, our proposed model can be highly computable so that it is suitable to be implemented in various socially interactive robots.

Limitations. Our proposed model
AppraisalCloudPCT is designed based on Marsella et al.'s compositional view of model building [5], which lays stress on that emotional models are often composed of individual "submodels" or "smaller components" that can be matched, mixed, or excluded from any given implementation and are often shared. According to Marsella et al. [5], components may be evaluated and subsequently abandoned or improved due to ongoing evaluations before the fnal version of the model is designed. Although our model is completely designed, there is still room for fnding alternative or better mathematical defnitions, equations, or algorithms for realizing each individual "submodels." 6.3. Discussion. In this article, we proposed a novel computational model of emotions called AppraisalCloudPCT and elaborated on how to implement it in a socially interactive robot we developed for autistic rehabilitation. However, there are several points that are worthy of being addressed as follows.
First of all, this study is aimed specifcally at designing the computational model of emotions for autistic childrenrobot interaction for three reasons as follows. (1) Although minimal progress has been made in advancing the clinical use of robotics in ASD interventions in clinical settings [13], applying robots for autism interventions still achieved a number of targets [11], and 24 of 74 ASD objectives in the "eight domains" as mentioned in Section 1 can potentially be applied to. (2) Modeling of emotions is of critical importance for robots when interacting socially with humans [8]. Tis is so because the robot's emotional responses are determined by the robot's computational model of emotion, in the light of its own internal cognitive-afective state and its interactions with the external environment [7]. (3) Tere are four world-leading research groups with pioneering work in promoting social robots as useful tools in autism therapy, but none of them have designed or applied computational models of emotions for the social robots used in their autism therapy studies.
Second, in Section 4.2, we chose the fve crucial properties of a computational emotion model as listed in [103], alongside with one more property, i.e., combining with cloud robotics, to be the six criteria for comparison. We believe that "combining with cloud robotics" can be a crucial property of a computational emotion model, and it can be a fair criterion for comparison of computational emotion models to make robots smarter and better satisfed by the users, as well as to promote sales in the service robots market for three reasons as follows. (1) As mentioned before, a computational model of emotions should endow a robot with a more powerful capability of making decisions faster, more appropriate, and more efcient, given that more and more socially interactive robots will be exposed to various users with diferent backgrounds and be connected to substantial Internet of Tings (IoT) such as medical IoT with massive medical data. As "combining with cloud robotics" can facilitate sharing information between and learning from socially interactive robots, we believe that this property will be crucial in building the computational models. (2) Given that other crucial properties such as (iv) data-driven mapping and (v) ethical reasoning are heavily data-driven and in great demand of computing power, "combining with cloud robotics" could be an efcient if not the best way to guarantee that data consisting of interaction between a user and a robot can be stored and evaluated in a cloud for further assessment of the social and communicative characteristics of a user. (3) On one hand, more and more socially interactive robots are implemented artifcial intelligence (AI) algorithms or deep learning (DL) (e.g., the modifed PPAnet implemented in our own robot)/reinforcement learning (RL)/deep reinforcement learning (DRL) frameworks to make them smarter and better received by the users; on the other hand, deploying them in the main controller of a robot rather than in a cloud will increase the hardware cost due to increased computational load. Since the parents of autistic children usually sufer from heavy burden not only mentally but fnancially, "combining with cloud robotics" would be necessary for promoting robots with acceptable prices in the service robots market to those parents.
Tird, our proposed model AppraisalCloudPCTcould be implemented in a socially interactive robot that we developed for autistic rehabilitation. Such a model could also be adapted to service people with diferent special needs, e.g., dementia elders. Tis results from that our proposed computational model of emotions takes into account the social and communicative characteristics of a special group of users such as autistic children or dementia elders, through its submodel so-called cloud-based interaction strategies, which is supported by two submodels (i.e., "contextual understanding of scenarios and users" and "cloud-based evaluation system") in a cloud medical robot platform, as illustrated in Figure 1. As mentioned before, a cloud-based evaluation system enables the data of interaction between a user and a robot to be stored and evaluated in a cloud for further assessment of the social and communicative characteristics of a user. Furthermore, the submodel "contextual understanding of scenarios and users" relies on another two submodels "local pattern recognition" and "cloud-based pattern recognition," which can provide the interpretation of a user's intention, attention, emotional state, and behavior. Terefore, the proposed computational model of emotions is suitable for socially interactive robots, particularly robots for a special group of users such as autistic children or dementia elders, which promote the universality of our model to some extent. Moreover, our proposed computational model also meets the criteria of "domainindependent," i.e., processing and exhibiting emotional responses in various situations as well as in certain kinds of interaction domain, since it can coordinate a robot implemented with our model to respond to the surrounding emotional contexts appropriately. For our proposed model to be adapted to socially interactive robots servicing dementia elders, a few steps would be necessary as follows. (1) As illustrated in Figure 7, a group-level perception of afect network (GPA-net) in the modifed PPA-net will be trained with the data exacted from the dementia rating scales such as mini-mental state examination (MMSE) [127] provided by the doctor or therapist of the dementia elder, and the data exacted from the personality profle of the dementia elder provided by the ofspring or close friends of the dementia elder. Consequently, by using the modifed PPA-net, this submodel can automatically provide simultaneous and continuous estimation of the diferent levels of afective states (i.e., valence and arousal) and engagement of a dementia elder, to the submodel "cloud-based interaction strategies." (2) With the support from the two submodels "cloud-based evaluation system" and "contextual understanding of scenarios and users" which can provide the specifc estimation of valence, arousal, and engagement levels of the dementia elder, and the contextual understanding of the interactive scenario and the dementia elder, respectively, the robot will be able to alter its mood and personality to match with the status of the dementia elder and the interactive context using the three interaction strategies in the submodel "cloud-based interaction strategies," for a better round of interaction. (3) Our proposed model is designed in the frst place to enable many rounds of recursive interaction between a robot and a user. Based on the feedback (e.g., the interpretation of a dementia elder's intention, attention, emotional state, and behavior) provided by the dementia elder during/after the human-robot interaction, as summarized by the submodel "achievement (perceived outcome)", the three interaction strategies in the submodel "cloud-based interaction strategies" can be modifed accordingly. As such, the interaction will be more efective, efcient, and easier to satisfy the needs of dementia elder after many rounds of recursive interaction.

Future Work.
Future studies should examine how our model performs in various robots and in more interactive scenarios.

Data Availability
All data included in this study are available from the corresponding author upon request.

Conflicts of Interest
Te authors declare that there are no conficts of interest.